The Diary

Train Big, Plan Smart - How to Calculate Memory and Estimate GPUs for LLMs

Unlocking the Basics

Posted on May 23, 2025

Training large language models isn’t just a question of can you do it—it’s a question of how smartly you do it. If you’ve ever wondered how researchers train those massive AI models with billions of parameters, it all starts with smart planning. Behind every successful LLM training run is a... [Read More]

Tags: LLMTraining, Memory estimation, GPU sizing, Generative AI, Model Scaling

Weight Initialization - The First Principle

Journey from basics to advanced.

Posted on April 13, 2020

Weight Initialization is the most underrated concept in the deep learning terminology. I have seen many newbie deep learning practitioners and even some experienced ones ignoring this important concept. Unlike some already available tutorials or blogs, we will not talk about why you should not initialize your weights with all... [Read More]

Tags: weight, initialization, deep learning, xavier, kaiming, eigenvalue, eigenvector